[[ (book)Bowles_3.3_rocks | Measuring the Performance of Predictive Models

Performance Measures for Different Types of Problems

Performance measures for regression problems

  • mean squared error (MSE)
  • mean absolute error (MAE)
  • root MSE (RMSE, which is the square root of MSE)
  • Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py
  • variance (mean squared deviation from the mean)
  • standard deviation (square root of variance)
  • """For example, if the MSE of the prediction error is roughly the same as the target variance (or the RMSE is roughly the same as target standard deviation), the prediction algorithm is not performing well. You could replace the prediction algorithm with a simple calculation of the mean of the targets and perform as well.
  • """The errors in Listing 3-1 have RMSE that’s about half the standard deviation of the targets. That is fairly good performance.
  • histogram of the error
  • tail behavior (quantile or decile boundaries)
  • degree of normality

Classification problems

  • misclassification error rates
  • Generally, algorithms for doing classification can present predictions in the form of a probability instead of a hard click versus not-click decision. The algorithms considered in this book all output probabilities ... the data scientist has the option to use 50 percent as a threshold
  • confusion matrix or contingency table
    • confusionMatrix() ... takes the predictions, the corresponding actual values (labels), and a threshold value as input
  • receiver operating characteristic (ROC)
    • The ROC curve plots the true positive rate (abbreviated TPR) versus the false positive rate (FPR).
    • JB: geht ROC auch mit missclassification rate?
  • area under the curve (AUC)
    • A perfect classifier has an AUC of 1.0
    • random guessing has an AUC of 0.5

Simulating Performance of Deployed Models

training set

test set

[validation set?]

  • benutzt für n-fold cross validation?

*

  • CODE Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py Figure 3-9: Confusion matrix example Listing 3-2: Measuring Performance for Classifier Trained on Rocks-Versus-Mines— classifierPerformance_RocksVMines.py Table 3-2: Dependence of Misclassification Error on Decision Threshold Table 3-3: Cost of Mistakes for Different Decision Thresholds Figure 3-10: In-sample ROC for rocks-versus-mines classifier Figure 3-11: Out-of-sample ROC for rocks-versus-mines classifier